Goto

Collaborating Authors

 political ideology


Investigating Political and Demographic Associations in Large Language Models Through Moral Foundations Theory

Smith-Vaniz, Nicole, Lyon, Harper, Steigner, Lorraine, Armstrong, Ben, Mattei, Nicholas

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have become increasingly incorporated into everyday life for many internet users, taking on significant roles as advice givers in the domains of medicine, personal relationships, and even legal matters. The importance of these roles raise questions about how and what responses LLMs make in difficult political and moral domains, especially questions about possible biases. To quantify the nature of potential biases in LLMs, various works have applied Moral Foundations Theory (MFT), a framework that categorizes human moral reasoning into five dimensions: Harm, Fairness, Ingroup Loyalty, Authority, and Purity. Previous research has used the MFT to measure differences in human participants along political, national, and cultural lines. While there has been some analysis of the responses of LLM with respect to political stance in role-playing scenarios, no work so far has directly assessed the moral leanings in the LLM responses, nor have they connected LLM outputs with robust human data. In this paper we analyze the distinctions between LLM MFT responses and existing human research directly, investigating whether commonly available LLM responses demonstrate ideological leanings: either through their inherent responses, straightforward representations of political ideologies, or when responding from the perspectives of constructed human personas. We assess whether LLMs inherently generate responses that align more closely with one political ideology over another, and additionally examine how accurately LLMs can represent ideological perspectives through both explicit prompting and demographic-based role-playing. By systematically analyzing LLM behavior across these conditions and experiments, our study provides insight into the extent of political and demographic dependency in AI-generated responses.


Datasets for Fairness in Language Models: An In-Depth Survey

Zhang, Jiale, Wang, Zichong, Palikhe, Avash, Yin, Zhipeng, Zhang, Wenbin

arXiv.org Artificial Intelligence

Despite the growing reliance on fairness benchmarks to evaluate language models, the datasets that underpin these benchmarks remain critically underexamined. This survey addresses that overlooked foundation by offering a comprehensive analysis of the most widely used fairness datasets in language model research. To ground this analysis, we characterize each dataset across key dimensions, including provenance, demographic scope, annotation design, and intended use, revealing the assumptions and limitations baked into current evaluation practices. Building on this foundation, we propose a unified evaluation framework that surfaces consistent patterns of demographic disparities across benchmarks and scoring metrics. Applying this framework to sixteen popular datasets, we uncover overlooked biases that may distort conclusions about model fairness and offer guidance on selecting, combining, and interpreting these resources more effectively and responsibly. Our findings highlight an urgent need for new benchmarks that capture a broader range of social contexts and fairness notions. To support future research, we release all data, code, and results at https://github.com/vanbanTruong/Fairness-in-Large-Language-Models/tree/main/datasets, fostering transparency and reproducibility in the evaluation of language model fairness.


Biased AI improves human decision-making but reduces trust

Lai, Shiyang, Kim, Junsol, Kunievsky, Nadav, Potter, Yujin, Evans, James

arXiv.org Artificial Intelligence

Current AI systems minimize risk by enforcing ideological neutrality, yet this may introduce automation bias by suppressing cognitive engagement in human decision-making. We conducted randomized trials with 2,500 participants to test whether culturally biased AI enhances human decision-making. Participants interacted with politically diverse GPT-4o variants on information evaluation tasks. Partisan AI assistants enhanced human performance, increased engagement, and reduced evaluative bias compared to non-biased counterparts, with amplified benefits when participants encountered opposing views. These gains carried a trust penalty: participants underappreciated biased AI and overcredited neutral systems. Exposing participants to two AIs whose biases flanked human perspectives closed the perception-performance gap. These findings complicate conventional wisdom about AI neutrality, suggesting that strategic integration of diverse cultural biases may foster improved and resilient human decision-making.


Aligning LLMs on a Budget: Inference-Time Alignment with Heuristic Reward Models

Nakamura, Mason, Mahmud, Saaduddin, Wray, Kyle H., Zamani, Hamed, Zilberstein, Shlomo

arXiv.org Artificial Intelligence

Aligning LLMs with user preferences is crucial for real-world use but often requires costly fine-tuning or expensive inference, forcing trade-offs between alignment quality and computational cost. Existing inference-time methods typically ignore this balance, focusing solely on the optimized policy's performance. We propose HIA (Heuristic-Guided Inference-time Alignment), a tuning-free, black-box-compatible approach that uses a lightweight prompt optimizer, heuristic reward models, and two-stage filtering to reduce inference calls while preserving alignment quality. On real-world prompt datasets, HelpSteer and ComPRed, HIA outperforms best-of-N sampling, beam search, and greedy search baselines in multi-objective, goal-conditioned tasks under the same inference budget. We also find that HIA is effective under low-inference budgets with as little as one or two response queries, offering a practical solution for scalable, personalized LLM deployment.


KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models

Kim, Seorin, Lee, Dongyoung, Lee, Jaejin

arXiv.org Artificial Intelligence

Large language models (LLMs) often exhibit societal biases in their outputs, prompting ethical concerns regarding fairness and harm. In this work, we propose KLAAD (KL-Attention Alignment Debiasing), an attention-based debiasing framework that implicitly aligns attention distributions between stereotypical and anti-stereotypical sentence pairs without directly modifying model weights. KLAAD introduces a composite training objective combining Cross-Entropy, KL divergence, and Triplet losses, guiding the model to consistently attend across biased and unbiased contexts while preserving fluency and coherence. Experimental evaluation of KLAAD demonstrates improved bias mitigation on both the BBQ and BOLD benchmarks, with minimal impact on language modeling quality. The results indicate that attention-level alignment offers a principled solution for mitigating bias in generative language models.


"Amazing, They All Lean Left" -- Analyzing the Political Temperaments of Current LLMs

Neuman, W. Russell, Coleman, Chad, Dasdan, Ali, Ali, Safinah, Shah, Manan, Meghani, Kund

arXiv.org Artificial Intelligence

"Amazing, They All Lean Left" - Analyzing the Political Temperaments of Current LLMs Abstract Recent studies have revealed a consistent liberal orientation in the ethical and political responses generated by most commercial large language models (LLMs), yet the underlying causes and resulting implications remain unclear. This paper systematically i nvestigates the political temperament of seven prominent LLMs -- OpenAI's GPT - 4o, Anthropic's Claude Sonnet 4, Perplexity (Sonar Large), Google's Gemini 2.5 Flash, Meta AI's L l a ma 4, Mistral 7b Le Chat, and High - Flyer ' s DeepSeek R1 -- using a multi - pronged approach that incl udes Moral Foundations Theory, a dozen established political ideology scales, and a new index of current political controversies. We find strong and consistent prioritization of liberal - leaning values, particularly care and fairness, across most models. Fur ther analysis attributes this trend to four overlapping factors: liberal - leaning training corpora, reinforcement learning from human feedback (RLHF), the dominance of liberal frameworks in academic ethical discourse, and safety - driven fine - tuning practices . We also distinguish between political "bias" and legitimate epistemic differences, cautioning against conflating the two. A comparison of base and fine - tuned model pairs reveals that fine - tuning generally increases liberal lean, an effect confirmed throu gh both self - report and empirical testing. We argue that this "liberal tilt" is not a programming error or the personal preferences of programmers but an emergent property of training on democratic, rights - focused discourse. Finally, we propose that LLMs may indirectly echo John Rawls' famous veil - of - igno rance philosophical aspiration, reflecting a moral stance unanchored to personal identity or interest. Rather than undermining democratic discourse, this pattern may offer a new lens through which to examine collective ethical reasoning. In the course of our research on the ethical logics of currently prominent large language models (Neuman et al. 2025a, b; Coleman et al. 2025), we encountered an interesting finding. The responses to various ethical dilemmas and the explanations of the underlying logics used by these models appear to resonate with the liberal side of the political spectrum. One research analytic we utilize draws on Moral Foundation Theory's five - element typology of foundational moral principles (Graham et al. 2009; Haidt 2012). The five foundations emp hasizing in turn, Care, Fairness, Loyalty, Authority and Purity, are traditionally divided into two clusters. The first two, Care and Fairness, are associated with a liberal political perspective, while conservatives who fully acknowledge the first two more often emphasize the latter three -- Loyalty, Authority and Purity in support of traditional norms.


Probing the Subtle Ideological Manipulation of Large Language Models

Paschalides, Demetris, Pallis, George, Dikaiakos, Marios D.

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have transformed natural language processing, but concerns have emerged about their susceptibility to ideological manipulation, particularly in politically sensitive areas. Prior work has focused on binary Left-Right LLM biases, using explicit prompts and fine-tuning on political QA datasets. In this work, we move beyond this binary approach to explore the extent to which LLMs can be influenced across a spectrum of political ideologies, from Progressive-Left to Conservative-Right. We introduce a novel multi-task dataset designed to reflect diverse ideological positions through tasks such as ideological QA, statement ranking, manifesto cloze completion, and Congress bill comprehension. By fine-tuning three LLMs-Phi-2, Mistral, and Llama-3-on this dataset, we evaluate their capacity to adopt and express these nuanced ideologies. Our findings indicate that fine-tuning significantly enhances nuanced ideological alignment, while explicit prompts provide only minor refinements. This highlights the models' susceptibility to subtle ideological manipulation, suggesting a need for more robust safeguards to mitigate these risks.


Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs

Wachter, Jasmin, Radloff, Michael, Smolej, Maja, Kinder-Kurlanda, Katharina

arXiv.org Artificial Intelligence

We introduce an Item Response Theory (IRT)-based framework to detect and quantify socioeconomic bias in large language models (LLMs) without relying on subjective human judgments. Unlike traditional methods, IRT accounts for item difficulty, improving ideological bias estimation. We fine-tune two LLM families (Meta-LLaMa 3.2-1B-Instruct and Chat- GPT 3.5) to represent distinct ideological positions and introduce a two-stage approach: (1) modeling response avoidance and (2) estimating perceived bias in answered responses. Our results show that off-the-shelf LLMs often avoid ideological engagement rather than exhibit bias, challenging prior claims of partisanship. This empirically validated framework enhances AI alignment research and promotes fairer AI governance.


Mapping and Influencing the Political Ideology of Large Language Models using Synthetic Personas

Bernardelle, Pietro, Fröhling, Leon, Civelli, Stefano, Lunardi, Riccardo, Roitero, Kevin, Demartini, Gianluca

arXiv.org Artificial Intelligence

The analysis of political biases in large language models (LLMs) has primarily examined these systems as single entities with fixed viewpoints. While various methods exist for measuring such biases, the impact of persona-based prompting on LLMs' political orientation remains unexplored. In this work we leverage PersonaHub, a collection of synthetic persona descriptions, to map the political distribution of persona-based prompted LLMs using the Political Compass Test (PCT). We then examine whether these initial compass distributions can be manipulated through explicit ideological prompting towards diametrically opposed political orientations: right-authoritarian and left-libertarian. Our experiments reveal that synthetic personas predominantly cluster in the left-libertarian quadrant, with models demonstrating varying degrees of responsiveness when prompted with explicit ideological descriptors. While all models demonstrate significant shifts towards right-authoritarian positions, they exhibit more limited shifts towards left-libertarian positions, suggesting an asymmetric response to ideological manipulation that may reflect inherent biases in model training.


Towards "Differential AI Psychology" and in-context Value-driven Statement Alignment with Moral Foundations Theory

Münker, Simon

arXiv.org Artificial Intelligence

Contemporary research in social sciences is increasingly utilizing state-of-the-art statistical language models to annotate or generate content. While these models perform benchmark-leading on common language tasks and show exemplary task-independent emergent abilities, transferring them to novel out-of-domain tasks is only insufficiently explored. The implications of the statistical black-box approach - stochastic parrots - are prominently criticized in the language model research community; however, the significance for novel generative tasks is not. This work investigates the alignment between personalized language models and survey participants on a Moral Foundation Theory questionnaire. We adapt text-to-text models to different political personas and survey the questionnaire repetitively to generate a synthetic population of persona and model combinations. Analyzing the intra-group variance and cross-alignment shows significant differences across models and personas. Our findings indicate that adapted models struggle to represent the survey-captured assessment of political ideologies. Thus, using language models to mimic social interactions requires measurable improvements in in-context optimization or parameter manipulation to align with psychological and sociological stereotypes. Without quantifiable alignment, generating politically nuanced content remains unfeasible. To enhance these representations, we propose a testable framework to generate agents based on moral value statements for future research.